watch a model teach itself to copy patterns
The induction head is the first circuit anyone fully reverse-engineered inside a transformer — and the reason models can learn from their own context. The rule it discovers: “if I saw ‘A B’ earlier and I just saw ‘A’ again, the next thing is probably ‘B’.” Train one here and watch it form.
it needs two layers to work: one head finds “the token before me,” a second uses that to find the match. same transformer-engine.js as the other lab — just a different task.
We feed the model random sequences that repeat — a run of symbols, then the very same run again:
Nothing about the symbols is predictable — except that the second half copies the first. The only way to win is to learn the copying trick itself. So the model is forced to grow an induction head.
Loss starts at pure-guess ({{ uniform }}). When it drops sharply, an induction head has clicked into place — that cliff is the circuit forming.
An induction head shows a signature offset diagonal: in the second half, each symbol looks back to just after its earlier twin. Scan layer 1's heads — one of them lights up this way once trained.
row = symbol looking · column = symbol looked at. before training it\u2019s mush; after, hunt layer 1 for the diagonal stripe offset into the first half.
The payoff: show it a fresh random run, repeat it, and at each spot in the copy ask for the next symbol. A trained induction head nails symbols it has never seen in that order before.
In-context learning. This one trick — copy what followed last time — is a big part of how LLMs pick up a pattern from your prompt with no retraining.
Two heads, composed. A layer-0 “previous-token” head writes each token\u2019s neighbor into the residual stream; a layer-1 head reads it to find the match. Circuits built from parts.
It was found by reverse-engineering. Researchers read this circuit out of a real model\u2019s weights — exactly the craft in the Reverse-Engineer a Mind lab.